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Abstract 

Based on the Frechet mean, we define a notion of barycenter corresponding to a 
usual notion of statistical mean. We prove the existence of Wasserstein barycenters 
of random probabilities defined on a geodesic space ( E,d ). We also prove the 
consistency of this barycenter in a general setting, that includes taking barycenters 
of empirical versions of the probability measures or of a growing set of probability 
measures. 
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Introduction 

Giving a sense to the notion of the mean of a data sample is one of the major activ¬ 
ities of statisticians. When dealing with complex variable data which do not possess an 
Euclidean structure, the mere issue of defining the mean becomes a difficult task. This 
problem arises naturally for a wide range of statistical research fields such as functional 
data analysis for instance in na, m, 0 and references therein, image analysis in j27| or 
[5], shape analysis in [T9j or [18] with many applications ranging from biology in [T2] to 
pattern recognition 12a just to name a few. 

When dealing with data that are probability measures, the issue of finding a central 
probability measure that will convey the information of the whole data is a difficult task. 
This has been tackled in [T] by considering a notion of barycenter with respect to the 
Wasserstein distance. This notion coincides with the notion of Frechet mean. That is, 
the Frechet mean of the points (x;)i<j<n of a geodesic space (E, d ) given weights (Aj)i<j<„ 
is defined as a minimizer of 

n 

x 1— » A id 2 (x, Xi). 

2—1 
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This definition provides a natural extension of the barycenter as it coincides on M. d with 
the barycenter Xuli A i x i °f the points (x,)i<j< n , with weights (A,.)!^;^. This function to 
be minimized can be rewritten as 

x i —y E cl 2 (X, x) 

if the distribution of the random variable X is the discrete measure 

n 

A 4 = /* ] A iS Xi , 

1=1 

where 5 denotes the Dirac measure. We will call any of these minimizer a barycenter of /i, 
so that the Frechet mean of (xi)i<j< n with weights (Aj)i<j< n is the barycenter of /i. There 
is then a natural extension of a barycenter for a probability measure, that is: a point x is 
said so be a barycenter of a measure p (not necessarily finitely supported) if it minimizes 

x n- E cl 2 (X, x) 

when the distribution of the random variable X is p. The first question to arise is whether 
this barycenter exists. 

When ( E , d ) is assumed to be a locally compact geodesic space, Hopf-Rinow theorem 
states that balls are compact and thus, the existence of this barycenter is straightforward. 
But it is not obvious in more general cases. 

In this paper, we consider barycenters in the Wasserstein space of a locally compact 
geodesic space. Since the Wasserstein space of a locally compact space is, in general, 
not locally compact, its existence is not as straightforward. However, in this setting, the 
Wasserstein space is a geodesic space of probability measures. The first goal of this paper 
is to prove existence of barycenters in this setting. 

Given p > 1, and denoting W the Wasserstein metric, previous work in this direction 
consider the barycenter of the probability measures (/q)i<j< n with weights (Aj)i<j< n , i.e. 
a minimizer of the following criterion 


v ^ A iW*(v, Pi), 

i= 1 

which is thus also the barycenter of the atomic probability P on the Wasserstein space, 
defined by 

n 

1=1 

An important result proved in [T], is the existence and uniqueness of this minimizer when 
the underlying space ( E , d) is the Euclidean space W l and p — 2. Uniqueness requires 
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the extra assumption that at least one of the /Vs vanishes on small sets. This vanishing 
property means that the considered measures give probability 0 to sets with Hausdorff 
dimension less than d— 1. In particular, any measure absolutely continuous with respect to 
the Lebesgue measure vanishes on small sets. This work of [Tj has been extended in [20] to 
compact Riemannian manifolds, with the condition to vanish on small sets being replaced 
by absolute continuity with respect to the volume measure. Since the Wasserstein space 
of a compact space is also compact, the existence of the barycenter in this setting is 
straightforward, but their work provides, among other results, an interesting extension 
of the work of [T], by showing a dual problem called the multimarginal problem, for 
any P of the form y)” =1 The same dual problem has been used in a previous work 

to show existence of barycenter whenever there exists a Borel (not necessarily unique) 
barycenter application on ( E n , d n ) that associate the barycenter of t° every n- 

uplets (aq,..., x n ). This assumption is actually always verified on locally compact geodesic 
spaces. This is the result of Lemma [71 It is a first step toward the proof of existence of 
barycenter for any P. 

This paper studies, in the setting of locally compact geodesic spaces, the existence 
of the barycenter and state consistency properties. In a previous work HU or pH], the 
authors studied some asymptotic results giving conditions under which a sequence of 
barycenters of discrete measures converging to a limit measure can be understood as a 
barycenter of the limit probability measure. This result enables to define the barycenter 
of empirical measures and study its asymptotic behavior. In the following, we propose an 
improved version of this limit theorem that enables to prove existence of barycenters of 
probabilities in a our more general framework. 

This paper falls into the following parts. Section |T] presents general definitions and 
states a general theorem that ensures the existence of a barycenter of probability measures 
in the Wasserstein space. In Section EJ a consistency result is proven. Section [3] is devoted 
to some statistical applications. The technical lemmas are presented in Section ED while 
the detailed proofs are postponed to Section SI 

1 Barycenter of a probability in Wasserstein space 

Given two points x, y in a metric space ( E , d), their mid-point is the point z G E such 
that 

d(x,z) = d(z,y ) = -d(x, y). 

Definition (Geodesic space) A space ( E , d) is called a geodesic space if 

• ( E , d) is a complete metric space and 

• every two points x,y E E have a mid-point z G E. 
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Note that in this case, the mid-point of x and y is the 2-barycenter of x and y with weights 

(I l ) 

Remark 1 Such spaces are sometimes called complete intrinsic metric length spaces (see 
for instance JT,10- 

Given a continuous path 7 : [0,T] —> E, its length is defined as 

A( 7 ) = sup I ^2 d (l( t i+i),'y(ti )); 0 = t 0 < ti < ... < t n = T 
l i =0 

Thus, a continuous path is said to be a geodesic, if for any interval [a,b\ C [0,T], the 
length of 7 restricted to [a,b] is d (7(a), 7(6)): 

A (7|[a,6]) = d (7(a). 7(6)) • 

It is known (see theorem 2-4-16 p .42 and lemma 2-4-8 p .41 in TTSf ) that a (separable) 
complete metric space is geodesic if and only if for every pair (x, y) there exists a geodesic 
joining x and y. 

Definition (Barycenter) Set p > 1 and let (E, d) be a geodesic space and p a proba¬ 
bility measure on ( E , d ) such that 

- a) 

for some (and thus any) xq E E. A point ay> £ E is called a p-barycenter of p if 



/#(x,x 0 )«x)=inf 


d p {x,y)dp{xfiy G E 


( 2 ) 


The set of all probability measures satisfying (JT]) is denoted W P (E). 

Barycenters do not always exists. On can find a geodesic space (E, d) and a probability 
measure p G V\? P (E) for which there exists no barycenter. However, the Hopf-Rinow-Cohn- 
Vossen theorem (see theorem 2.5.28 p. 52 in US) states that, on locally compact geodesic 
spaces, every closed ball is compact. Consequently, the inhmum in (J2J) can be taken on a 
compact ball, and thus existence of a barycenter is ensured. We thus have the following 
proposition. 

Proposition 1 Set p > 1 and let (E, d) be a locally compact geodesic space and p G 
W P (E). Then, there exists a barycenter of p. 

Metric spaces of nonpositive curvature (NPC spaces) provide another setting for which 
barycenters exist. We recall the definition of such spaces following [26lj . 
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Definition (NPC Spaces) A complete metric space ( E,d ) is called a global NPC space 
if for each pair of points Xq,Xi £ E , there exists y £ E such that for all z £ E, 



Such spaces are geodesic spaces and every probability measure on such spaces that 
satisfies J d 2 (x , x 0 )dp(x) < oo for some x 0 £ E has a unique 2-barycenter (see proposition 
4.3 in [25]). 


The goal of this paper is to study barycenters in Wasserstcin spaces. We first recall 
the definition of the Wasserstein space of a metric space (E,d). 

Definition (Wasserstein space) Set p > 1 and let (E, d) be a metric space. Given two 
measures p, v in W P (E), we denote by T(/i, u) the set of all probability measures tt over 
the product set E x E with first, resp. second, marginal /i, resp. v. The transportation 
cost with cost function d p between two measures /./, v in W P (E), is defined as 



The transportation cost allows to endow the set W P (E) with a metric W p defined by 


w p (p : u) = r p (y 1 u) 1 / p . 


This metric is known as the p- Wasserstein distance and the metric space ( W P (E ), W p ) is 
called the Wassertein space of (E,d). 


It is well known (see theorem 6.9 of [28] for instance, or proposition 7.1.5 in [4]) that 


W p metrizes the topology of weak convergence and convergence of moments of order p 
(i.e. f d p (x, x 0 )dp n — * f d p (x, x 0 )dp(x)). If (E,d) is a separable complete metric space, 
so is (W P (E),W P ) (see theorem 6.19 in [28]). 

Also, if (E,d) is a locally compact geodesic space, then ( W P (E),W P ) is a geodesic 
space. This result can be found in |2| (see lemma 2.4 and proposition 2.6) for the 
case ( E , d) compact, but the arguments are valid when ( E , d) is locally compact as well. 
However, (y\? p (E),W p ) is not locally compact unless (E,d) is compact (see remark 7.1.9 
in HD- Thus, existence results on locally compact spaces can not be applied to prove 
existence of barycenter on ( W(E ), W p ). 

Likewise, Wasserstcin spaces are not NPC spaces in general (see theorem 7.3.2 in [3]). 
Indeed, two probability measures p 0 , H\ can have more than one mid-point in ( W(E ), W p )\ 
each mid-point is a barycenter of \ (S M0 + d M1 ) £ W(W(E)). However NPC spaces assign a 
unique barycenter to every probability measure. Therefore Wasserstein spaces can not be 
NPC spaces and results for such spaces can not be applied to prove existence of barycenter 
in Wasserstein spaces. 
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We want to prove existence of barycenters in Wasserstein spaces. To that purpose, 
we consider a random probability measure jj in W P (E), following a distribution IP. This 
probability P is chosen in the space W p {W p {Ej) endowed with the metric W p . Note that 
we use the same notation for the Wasserstein distance over W P (E) and W p {W p {Ej). Thus, 
if jj G Wp(E) is a random measure with distribution P, then for all v G W P (E), we can 
write 

w?(<V P) = E (W£(V, P))= f p)dF(p). (3) 

For a probability P G W P (\ / V P (E)), consider a minimizer over v G W P (E) of 

v^E[W*{v,jj)]=W*{6 v ,F), 

where jj is a random probability of W P (E) with distribution P. If exists, this probability 
measure is a barycenter of P. 

We can now state existence result. 

Theorem 2 (Existence of a Wasserstein Barycenter) Set p > 1 and let (E, d ) be 

a separable locally compact geodesic space. Hence, for P G W P (W P (E)), there exists a 
barycenter jj p defined as 

jjw G arg min E [W£(v, jj)] , (4) 

ueWp(E) y 

for jj a random measure with distribution P. 

Using the expression (151) . we can see that Theorem [5]can be reformulated as stating the 
existence of the metric projection of P onto the subset of W P (W P (E)) of Dirac measures. 

Proof The proof of Theorem [2] relies on the existence of barycenters of finitely supported 
measures in W P (E) for which the core ideas were developped in [1], Those ideas are used 
for the first step of this proof. The proof is split in three steps. 

• First, consider a set of probability measures — 1,..., J of W p (E) and as¬ 

sume that P is a discrete measure defined, for positive weights Ai,..., A j such that 
ZU Aj = 1, as 

j 

p = ^ ZZr 

3 =1 

In this case 

j 

W p (S u , P) = E W*{y, jj) = Y. N)- 

j =i 

Within this framework, Theorem [5] reduces to an already solved problem, in the 
case p — 2 in |1J or d! and for general p in eh. It is recalled in this paper as 
Theorem [HJ 
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• To prove the theorem in the general case, we show that if there is a sequence of 
probability measures (Pj)j>i converging to a limit probability measure P and if for 
each P j there exists a barycenter pp., then there exists a barycenter pp of the limit 
probability P. Moreover pp is the limit of a subsequence of the barycenters (pp 3 )j>i- 
This result is stated as Theorem [3] in the following section. 

• Finally Proposition [13] concludes the proof showing that one can approximate any 
probability measure in Wp(Wp(i?)) by finitely supported probability measures. 


2 Consistency of the barycenter of a sequence of mea¬ 
sures 

The following theorem deals with a continuity issue of the barycenters. Consider a 
sequence (Pj)j>i C W P (E) converging to some P in W p (W p (E)). If these measures all 
admit a barycenter, it is natural to ask whether the sequence of barycenters also converges 
to a barycenter of P. Theorem [3] provides a positive answer. 

Theorem 3 Set p > 1 and let ( E , d ) be a separable locally compact geodesic space. Let 
(Pj)j>i C W P {W P (E)) be a sequence of probability measures on W P (E) and set pj a 
barycenter ofFj, for all j G N. Suppose that for some P G W P (W P (E)), we have that 

W p {¥. Pj) J l7t£ 0 o. Then, the sequence {pf)j> i is precompact in W P (E) and any limit is a 
barycenter of P. 

Sketch of proof The proof of Theorem [3] can be split into three steps. 

• The first step shows that the sequence of barycenters {p 3 ) 3 >\ is tight. It is a conse¬ 
quence of the fact that balls on (E, d) are compact together with Markov’s inequality 
applied to these balls. 

• The second step uses Skorokhod representation theorem and lower semicontinuity 
of v H > W p (p, is) for any p, to show that any weak limit of the sequence {p 3 ) 3 >\ is a 
barycenter of P. 

• The final step proves that the convergence of the {p 3 ) 3 >\ actually holds in W P (E). 

Applying this result to a constant sequence gives the following corollary. 

Corollary 4 The set of all barycenters of a given measure P G W P (W P (E)) is compact. 

An interesting and immediate corollary follows from the assumption that P has a 
unique barycenter. 
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Corollary 5 Suppose P e W P (W P (E)) has a unique barycenter. Then for any sequence 
(Pj)j>i C W p (yVp(E)) converging to P, any sequence {pj)j>\ of their barycenters converges 
to the barycenter of P. 


On E = R d and p — 2, there exists a simple condition under which the barycenter is 
unique. 


Proposition 6 Let P e W2(W2(M d )) such that there exists a set A C of measures 

such that for all p G A, 


B e B{R d ),dim{B) < d- 1 => p(B) = 0, 


(5) 


and P(A) > 0 , then, P admits a unique barycenter. 

Therefore, for any sequence (Pj)j>i converging to P in W^W^IKhO), the barycenters of 
Pj converge to the barycenter of P. 

Proof It is a consequence of the fact that if u satisfies (EJ), then p H > W 2 (p, v) is strictly 
convex and thus, so is p i— y KWf(p, p). 

3 Statistical applications 

Two statistical frameworks 

When confronted to the statistical analysis of a collection of probability measures in 
Wp(E), pi ,..., pj , it is natural to define a notion of variability as 



This quantity plays the role of a variance which measures the spread, with respect to the 
Wasserstein distance, of the measures around a point which is the Wasserstein barycenter. 
In this work, we extend this definition to match the notion of variance by defining 


inf E {W*(y,p)) 

is£Wp(E) V py " 


v(m) 


where p is a random probability measure in W(E). We provide some condition that 
ensures that this quantity is well defined and is achieved for a measure pp, which plays the 
role of the mean of the random measure p. Moreover, statistical inference in this setting 
has been tackled in two different frameworks whether the number of probabilities goes 
to infinity or whether the probabilities are not observed directly but through empirical 
samples. Theorem [3] handles both of these settings. 


The first point of view concerns the case where the distribution P e W P (W P (E)) is 
approximated by a growing discrete distribution Pj supported on J elements, with J 
growing to infinity. Consider a collection of measures Hj G W P (E) for j > 1, and weights 
A j > 0, and define the sequence of measures P j, J > 1 as follows 

i= 1 

Assume that Pj converges to some measure P with respect to Wasserstein distance. Hence 
Theorem [3] states that the barycenter (or any barycenter if not unique) of P j converges 
to the barycenter of P (provided P has a unique barycenter). 

The second asymptotic point of view deals with the case where the measures Hj are 
unknown but approximated by a sequence of measures //” converging with respect to 
the Wasserstein distance to measures Hj when n grows to infinity. Compared to the 
first framework, the number of measures here is fixed but only an estimation of the 
measures is known. This covers the interesting case where we observe i.i.d sample 
with i — 1,... ,n with distribution Hj e W P (E). Here /j” = A Y^' =] $x i:j is the empirical 
measure. Given positive weights (A,;) i<i<j (or a sequence of weights converging to them) 
the issue is whether the barycenter of the observed measure Xq=i Aconverges to the 
barycenter of the limit A j5 Pj in the case where this barycenter is unique. This 

problem has been answered positively in cu, up to extracting a subsequence, since the 
barycenter is not unique. Within this framework, set 

j 

P« = A j8 p n 

3 = 1 

with positive weights A j and measures (/J])i<j<j,n>i C W P (E) J converging to some limit 
measures (fJ>j)i<j<j G W P (E) J . Then Theorem [3] states that the barycenter (or any if not 
unique) converges to the barycenter of )Th =1 XjS^n (if unique). 

Implications of the results 

The existence and consistency of Wasserstein barycenters has several implications in 
statistics. 

First, the variance of a collection of measures is helpful to understand the separation 
between collections of probability measures. Goodness of fit testing procedures have been 
developed to assess similarity between two samples as in [TT] or [T5]. The test statistics 
relies on the computation of the variance of the sample. Its calculation uses its expression 
that involves the computation of the Wasserstein distance of the distribution with respect 
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to the mean of the probability, which is obtained by proving the existence of the mean 
distribution. 

Then, one of the major application is given by deformation models or registration issues 
of distributions. In these problems, one assume that an unknown template distribution 
/x is warped from different observations by a random deformation process. The goal is 
here to estimate the template using the observations. More precisely, the probability 
measured /ij are warped from the template by a random center deformation operator T 
with realizations T), such that 


l Ij j — — n°Tj . 

Then the barycenter of the /x/s is a proper estimate for the unobserved template. In a 
previous work m, a similar result has been proved under a more restrictive assumption on 
the /Uj’s: this result was proven in the case when E = W l and the (/Xj)j> 1 are admissible 
deformations in the sense that they can be written as the pushforward of a common 
probability measure /x by the gradient of a convex function. This setting has also been 
considered in [2]. In [9] this problem is also tackled in the particular case where the 
have compact support, are absolutely continuous with respect to the Lebesgue 
measure and are indexed on a compact set 0 of W l . They state more precisely that given 
a probability measure on 0, one can induce a probability measure P on W’ p (M d ), and if the 
(pj)j>i are chosen randomly under P® 00 , the (unique) barycenter of j Y2j =i converges 
to the barycenter of P, P-almost surely. In our case, we handle the general case where 
the family of deformations is a random function which induces a random distribution (ij 
with distribution P given by the law of the deformation, which enables to consider general 
random deformation models. Natural applications in biology arise when dealing with gene 
expressions that suffer from a huge variability due to the different ways of processing the 
data. The first task preliminary to any analysis is a normalization procedure to extract a 
mean feature which corresponds to the mean distribution or the Wasserstein barycenter 
as proved in na or HS|. In all these cases, our result provides the existence of the target 
mean distribution while the consistency results allow the barycenters to be approximated 
by taking the barycenter of noisy data sample. 

In a more general way, finding a way to combine complex information from several 
sources is a problem that is receiving a growing interest, in particular when the data can 
be modeled as random distributions or samples of distributions. It is the case in Big data 
when we want to exploit massive data sets that could have been collected by different 
units or that exceed the size to make feasible their analysis on a single machine. Hence 
inference on such data sets can not be conducted using all the data, and the barycenter 
of the distributions is a natural candidate to resume the information conveyed by all the 
data. In this framework the barycenter distribution plays the role of a consensus dis¬ 
tribution that could represent a consensus-based global estimation or confidence region. 
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This point of view is developed in [3] where the mean of the distribution is chosen as a 
representative distribution. Similar cases are considered in information fusion, where the 
goal is similar since it amounts to finding a mean measure that aggregates the information 
provided by different input measures. Hence the Wasserstein barycenter is a natural way 
to aggregate this information as pointed out in [TO]. In multi-target tracking, the main 
issue is the estimation of both the number and locations of multiple moving targets such 
as airplanes based on sensor measurements. In [B] the Wasserstein barycenter provides 
an alternative to the MOSPA (Mean Optimal Sub-Pattern Alignment) distance. 

Finally, when considering Bayesian inference, one is faced with the problem of ap¬ 
proximating a posterior measure. Such approximation can be done by sampling posterior 
from the data. For large data sets, computation of such samples becomes intractable. 
Thus, one can split the data into small subsets and combine the results of these local 
computations. Taking the mean of the Bayesian posterior measures provides a natural 
way to combine these local computations as pointed out in [25] . 

4 Proofs 

Lemma 7 (Borel barycenter application) Set p > 1 and let ( E , d ) be a separable 
locally compact geodesic space. Then, given any J E N* and weights (A j)i<j<j, there 
exists a Borel application T : E J —> E that associate ( %j)i<j<j to a minimum of x (->■ 
J2j =i A jd p (x,Xj). Such applications will be called Borel barycenter application. 

Proof of Lemma [7] Since ( E , d) is locally compact, applying theorem A.5 in [29] with 
X = E J ,Y = E and 

{ j j 

(xi,..., xj , x) E X x Y] A jd p (x, Xj ) < \jd p (z, xj)Wz E E 

3 =1 3 =1 

shows the existence of a Borel section / from nx(A) to X x Y of the projection nx : 
X xY —y X. Then T = Tiy ° / is a Borel barycenter application - where iiy : X x Y —» Y 
denotes the projection. 

Theorem 8 (Barycenter and multi-marginal problem) Let ( E , d) be a complete sep¬ 
arable geodesic space, p> 1 and J E N*. Given (//i)i<j<j E V P (E) J and weights (Xi)i<i<j, 
there exists a measure 7 E r(//i, minimizing 

7^ [ inf £ XA^xYd^xy.^xj). 

/ xGE l ' 

1 <i<J 

Moreover, denote T : E J E a Borel barycenter application (as in Lemma [7|j then 
the measure u = T# 7 is a barycenter of {ni)i<i<j and if this application is unique, any 
barycenter v is of the form v = T# 7 . 
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Proof of Theorem [ 8 ] This proof is adapted from proposition 4.2 of [T]. 

Existence of the solution of the multi-marginal problem is a direct consequence of 
lemma m 

Denote by 7 a solution of the multi-marginal problem and set v = T# 7 . Then, by 
definition of the Wasserstein distance, 

W p (Hi^) < J d p (x i ,T(x))d'j(x). 


and consequently, 


^ ^ \ i <F{x i ,T{x))d'y{x). ( 6 ) 

1 <i<J J 1 <i<J 

Also, for z> G V P (E), denote 77 G T(/ij,z>) the optimal transport plan between v and 
Hi. Using disintegration theorem, for any 1 < i < J, there exists a (conditional) measure 
Hi defined for u —almost any y, which satisfies 7 Ti(x,y) = Hii x ) ® v(y). Set then, 

6(x, y) = h V i(x 1 ) < 8 )... ® Hj( x j) ® z%), 

and denote 7 the law of the J first marginals of 9. Then, by construction of 6 , 


l<i<J 


22 X i W p(di> z> ) = 22 Xi d p (xi,y)d6(x,y ) 

l<i<J ^ 

= I 22 ^id p ( x i,y)d9(x,y) 

^ 1 <2<J 

> / inf ^ A id p (xi,z)d6(x,y) 

7 2 1 <i<J 

= f 22 ^id p { x i,T(x))d9(x,y) 
7 1 <«<J 

= I 22 ^id p ( x i,T(x))dj(x) 

7 i<«<j 

> f 22 \ dP ( x u T ( x ))dj( x ) 

7 i<«<j 


1<KJ 


where the last inequality is an application of (J 6 J). 

Since h is arbitrary, we have just shown that T #7 is a barycenter. 


(7) 
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Also, taking v a barycenter, (JTJ) becomes an equality, so that for 9 —almost any (x, y ) G 
E J x E, 

y \id p (xi,y) = inf ^ \id p (xi, z) = ^ \id p (xi, T(x)), 

l<i<J 1 <i<J 1<*<J 

and thus, if the barycenter application T is unique, T(x) = y, 0 '**almost surely and so 
T #7 = z>. Also, optimality of z), and OH]) show that 7 is a solution of the multi-marginal 
problem. 

Proof of Theorem [3] Denotes /q- a barycenter of Pj. The proof is in three steps. 

1. Proving the tightness of the sequence of the barycenters (Hj)j> i- 

2. Proving that any limit y of (yj)j> 1 (in the sense of the weak convergence of measures) 
is a barycenter. 

3. Proving that there exists v G W P (E) such that W p {v 1 fij) —> W p (u. y). The conclu¬ 
sion of the proof will be derived from Lemma [14j 

Let y and /q random measures with distribution respectively P and P j. 

1. First prove that the moments of order p of the random measures considered as 
random variables /q can be bounded from above by a constant M < 00. 

Let fij be a random measure drawn according to a distribution P j. Then, for any 
x G E 

W p (M i ,6 x ) = W r (S„M 

= (EW’ T (^j)f P +(EWy» i ,5 I ))' ,r 

<2 (EWyfi j ,6 I )) l "‘ since y-j is a minimizer of v 1 —y EW p {v , yj) 

— 2W p (Fj, 5s x ) 

< 2 (W p (Fj, P) + W p ( P, Ss x )) < M < 00 since hPp(P, Pj) —>■ 0. 

Denote B(x,r) the ball of E centered in x with radius r. Then Markov’s inequality 
entails that 

The compactness of the balls of E entails that the sequence (yj)j> 1 is tight. So 
it can be extracted a sequence which converges towards a distribution that will be 
denoted y. For ease of notations, the subsequence will be denoted as the initial 
sequence. 
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2. Let v G W P (E) and fj, a random measure with distribution P. We get 

EW*(v,ji) = W*(5 v , P) 

= lim WgiS^Fj) since Wp(Pj,P) ->• 0 

j—> OO ^ 

= lim EWEu, /L) 

j—> OO F 

> lim EWEjij, fij) since /q is a barycenter (8) 

j—> OO ^ 

> E lim inf WEp-j. pj) using Fatou’s Lemma for any coupling of the /b,-’s 

j—too ^ 

> EW£(fj,,p) since W p is lower semi-continuous. 

For the last inequality, we used that since Pj —* P, Skorokhod’s representation 
theorem enables to build fij —y p a.s.. This proves that n is a barycenter of P. 

3. For v — //, the inequality (jSJ) is in fact an equality which implies that 

W p (S tlj ,Fj) —> VF p (<5 M ,P). 

Hence 

VF p («5 m ,,P) - W^P) < VF p (^,P,) + VF p (Pj,P) - W p (Sp,F) 0. 

This implies that 

EW>{ t i,p) = W p {8 li , P) 

= lim W p (S Pj , P) 

o 

= lim EW^,P) 

J —^OO 

> F’lim inf W p (fij, p) using Fatou’s Lemma 

j^fOO ^ 

> EW£(n, p) using again semi-lower continuity of W p for weak convergence. 
So P-a.s, (since lim inf W p (/i rj p) > W p (n, p)) 

lim inf W p (fjt j ,fi) = W p (fjt,l 2). 

So all along a subsequence and for a v G W(F'), W p {^j,v) —» W p (n,u). So using 
Lemma dH we get that 

Wp{nj, fi) —> 0 , 

which concludes the proof. 
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5 Technical Lemmas 


The following five results are well known. They are recalled here for the purpose of 
clarity of the proofs. 

Lemma 9 (Consistency in L 1 ) Let (X n ) n >i be a sequence of real valued random vari¬ 
ables such that 


X n —^ X a.s. 
E\X n \^E\X\. 


Then, X n —> X in L 1 . 

Lemma 10 (Uniform integrability) A family of real valued random variables PL is 
uniformly integrable (in the sense that sup XgW Y |> a } —* 0 as a —> +ooj if and 

only if the two following conditions hold 

i) sup XgW E\X\ < oo (bounded in L 1 ) 

ii) Ve > 0, 3a > 0 such thatMA e A, (P(A) < a =>- sup XgW f A |X|cflP < e) (equicon- 
tinuity). 

Lemma 11 (Consistency in L 1 and uniform integrability ) Let X n X in prob¬ 
ability, then the sequence (X n ) n >i is uniformly integrable if and only if X n —$■ X in L 1 . 

Lemma 12 (Tightness of fixed marginals set of measures) Let (E, d ) be a Polish 
space (i.e. a complete separable metric space). Let C\, be compacts sets ofW p (E). 

Then, the set T(C'i,..., Cj ) defined as the set of probability measures on E J with marginals 
respectively in C\, ...,Cj, is compact. 

Proposition 13 (Approximation by finitely supported measures) For all P there 
is a sequence of finitely supported probabilities Pj such that 

WpfPj, P) —» 0. 

Here is a lemma used for the proof of Theorem [3j 

Lemma 14 Let (/i n )n >i be a sequence of measures on a Polish space (E, d) which con¬ 
verges weakly towards p. If there exists a measure u such that 


W p (p n , v) ->• W p (fJL,u), 


then 

W p (p n , p) —> 0. 


(9) 
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Proof Note first that if v — 5 X for a given x £ E, then (jUJ is true, due to the fact that 
Wasserstein convergence is equivalent to the weak convergence plus convergence of the 
order p moments (see [28]). 

First using the Gluing Lemma (see for instance in [28] or i), build three sequences 
(2f n ) n >i, (y n ) n >i, (Z n ) n > i with distribution respectively p n , v and p such that 

(X n ,y n )~7ri,(y n ,Z n )~ nl 

where 7r* and tt ^ are the optimal transport maps between respectively p n and v and 
between v and p. Let IL„ be the distribution of (X n ,Y n , Z n ). Since the three marginals 
weakly converge, the sequence (n n ) n >i is tight. Thus, we can extract a subsequence such 
that 

Ll n —» II weakly, 

where II has marginal probabilities p, v and p. 

Then Skorokhod’s representation Theorem enables to construct a space (0,*4, P) on 
which there exist X , Y, Z with joint distribution II and copies of (X n , Y n , Z n ) with law Ll n 
such that 

d(X n , X) + d(Y n , Y) + d(Z n , Z) 0 P-a.s.. 

If we show that (d p (X n , X)) n >! is uniformly integrable then using Lemma ([IT]) , we get 

E d?(X n ,X) ->■ 0, 

which implies the result since W p (p n , p) < E d p (X n ,X). 

Uniform integrability remains to be proven. Note that Lemma (ITU]) entails that it is 
equivalent to prove the two following assumptions 

i) sup n>1 Ed p (X n ,X) < oo (bounded in L l ) 

ii) Ve > 0,3a > 0 such that \/A £ A , (P(A) < a ==?■ f A d p (X n , X)dP < e) (equicon- 
tinuity). 

Assertion i) is a consequence of - since Ed p (X n , X) < Ed p (X n , Z n ), 

Ed p (X n , X) < C p [E d p (X n , Y n ) + E d p (Y n , x ) + E d p (x, Z n )] 

= Cp ( w p (p n , u) + W p (u, S x ) + W P (8 X , p)) 

< M < oo since we assumed that W p (p ni u) —$■ W p (p , u). 

To prove Assertion ii), set A E A. We have that 

E d p (X n , X)1 A < C p [E d p (X n , Y n )l A + E d p (Y n , x)l A + E d p (x, Z n ) 1 A \ . (10) 

Note that d p (X n ,Y n ), d p (Y n ,x ) and d p (x,Z n ) converge towards respectively d p (X,Y), 
d p (Y,x ) and d p (x,Z) a.s. Their L 1 norm converge also, for the first term by assumption 
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and since Y n and Z n are identically distributed, for all n > 1. Hence using Lemma E] 
they converge in L 1 and thus are equicontinuous sequences. Hence this implies that for 
all e > 0, there exists a > 0 such that the three terms 

Ed p (X n , Y n ) 1 A + E d p (Y n , x)l A + E d p (x, Z n )l A < 3e 
for any A such that P(A) < a. 

Hence inequality (TTOj) implies that (d p (X n , X)) n >i is equicontinuous. Since it is also 
bounded in L 1 , this sequence is uniformly integrable, which proves the result. 
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